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Abstract 

We analyze the existence of community structures in two different social networks obtained 
from similarity and collaborative features between musical artists. Our analysis reveals some 
characteristic organizational patterns and provides information about the driving forces behind the 
growth of the networks. In the similarity network, we find a strong correlation between clusters of 
artists and musical genres. On the other hand, the collaboration network shows two different kinds 
of communities: rather small structures related to music bands and geographic zones, and much 
bigger communities built upon collaborative clusters with a high number of participants related 
through the period the artists were active. Finally, we detect the leading artists inside their 
corresponding communities and analyze their roles in the network by looking at a few topological 
properties of the nodes. 

PACS numbers: 05.45.Xt,42.55.Px,42.65.Sf 
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Music is one of the richest sources of interaction between individuals. Be- 
sides the usual connections between artists and listeners, it is possible to have 
artist-artist and listener-listener relations. In the current work we analyze 
artist-artist interactions and their implications in music similarity and collabo- 
ration. To that end, we construct two different networks where nodes represent 
musical artists: the similarity network, where artists are linked if a certain sim- 
ilarity exists between them (evaluated by musical editors) and the collaboration 
network, where a link exists between two artists if they have ever performed 
together. We detect and analyze the internal communities that spontaneously 
arise in both networks, which are driven by musical/social "forces", and show 
that the appearance of these communities is strongly related to the existence of 
musical genres. Furthermore, we are able to discriminate the main actors in the 
formed structures and extract their role in the network through the calculation 
and classification of a few topological properties of the nodes. 



I. INTRODUCTION 

Since the seminal paper of Milgram [l] investigating the flow of information through 
acquaintance networks, social (complex) networks have attracted the interest of scientists 
in a variety of fields |2|]. Many kinds of social structures arise when analysing the different 
types of interdependency among individuals (or organizations), such as financial exchange, 
friendship, kinship, sexual relations or disease transmission. In the current work we focus 
on those social networks where music is the driving force that generates interaction between 
individuals. Specifically, we consider musical artists as the fundamental nodes of the network 
and a certain musical relation as the linking rule. Two different types of networks are 
obtained: first, the similarity network, where artists are linked if their music are somewhat 
similar, and second, the collaboration network, where artists are linked if they have ever 
performed together. The relevance of these kinds of networks does not only rely on a social 
science perspective but also in musical aspects, such as the understanding of musical genres 
3, 4] or music recommendation 

Networks are obtained from the All-Music database of music metadata ^. The content 
of the database is created by professional editors and writers. Despite the linking rule 
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FIG. 1: (a) and (c) show the cumulative degree distribution Pc{k) of the similarity and collaboration 
networks respectively (note the different scale in the X-axis), (b) and (d) are their corresponding 
nearest neighbor degree distributions knn{k)- Parameters shown in the table are: number of nodes 
(n), number of links (m), highest degree {kmax), diameter of the network CD), mean shortest path 
(< d >), clustering coefficient (C) and Pearson correlation coefficient (r) [15]. 

being clear when creating the collaboration network, the similarity between artists is a more 
complex task. A great deal of research is devoted towards the development of audio content- 
based algorithms capable of quantifying similarity between musical pieces 

0,y,3- Although 

great advances have been made in this field, the criterion of musical experts still prevails 
over similarity software. If we translate the problem from musical pieces to musical artists 
[lo| . the evaluation of musical similarity becomes a subjective task where expert musical 
editors have the last say. 



The intersection between both networks has been recently analyzed ll|] from a complex 
network perspective [12I, [l^. In the current work we go one step further by studying the 
structures that arise in the spontaneous organization of these particular social networks. 
Specifically, we are interested in the existence and characterization of communities inside 
the network and the driving forces that induce their appearance. We also see how different 
kinds of community structures arise at different partition levels and how they are related to 
the existence of musical genres (in the case of the similarity network) and inter/intra band 
collaboration (in the case of the collaboration network). Figure [1] summarizes the main 
parameters of the network together with the cumulative degree distributions Pc{k) and the 
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nearest neighbor degree distributions /c„„(fc). Despite both networks sharing a small world 



topology 



, there exist differences in their degree distribution and assortativity 



II. COMMUNITY DETECTION 



years 



Detection of communities in complex networks has gained a lot of attention during recent 
a fact reflected in the existence of several community-detection algorithms. 



16 



Among them, we have selected the Girvan-Newman (GN) algorithm IGj for its agreement 
between effectiveness and time consumption. As we will explain later, the GN is valid only 
for low to moderate values of the inter-community connections, which is the case of the 



Cajun & Zydeco - 
Gospel . 

Blues - 

Opera . 
Hard Rock - 



1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
16 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 

34 Spoken Comedy/ Rock . 

35 Rock . 

36 Funk / 70's / Rock Oldies . 

37 Modern Electric Blues . 

38 Country / Rock . 

39 Folk & Roots . 

40 Folk & Roots . 
4-| Folk & Roots . 



ENTIRE NETWORK 



. Hip-Hop 



"Rock" "Jazz" 



R&B/POP- 



Country / Folk . 
Celtic - 



Country / Rock — 
House-Dance / Electronic . 



Avant Garde 
I 



- World & International 

- Latin 

■ New Age 



- Standards 

- Swing 

. Latin Jazz 

■ Big-bands / Swing 
. ? 

■ Easy Listening 



. Female Vocal Blues 
. Jazz 

- Swing 

. Bolero / Mambo 
. Bop Jazz 
. Avant Garde 

- Type I 



Bop 



Free Jazz / Avant Garde 



FIG. 2: Dendogram of communities detected in the similarity network when applying the GN 
algorithm. In every step (left column) a cluster (community) splits out from the network. 
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networks analyzed here. 

The GN algorithm is based on the sequential removal of those links with the highest 
betweenness, which is measured as proportional to the number of shortest paths running 
along each link [20]. This way, the network breaks into isolated clusters (communities) 
which, in turn, can be further split in successive steps. In Figure [21 we plot this evolution 
for the similarity network. In order to understand the emergent communities, we use the 
fact that the All Music database tags each artist as belonging to one or more genres and we 
choose the most frequent tag to label each community. We can identify the first split as a 
hip-hop community, followed by the division into two main groups dominated by "rock" and 
"jazz" artists respectively. In subsequent divisions there appear genres such as Blues, Opera 
or Hard Rock from the former "rock" community, and Jazz, Latin-Bolero and Standards 
from the Jazz community. 

In order to quantify the quality of the divisions we compute the modularity Q of each 
partition. As explained in 20|, a modularity (5 = indicates that the detected community 
structure is similar to the one existing in an equivalent random network or, in other words, 
links between nodes are randomly distributed and they are not related to the existence of 
certain cliques inside the network. On the contrary, values approaching (5 = 1, which is the 
maximum, indicate strong community structure. 



Figure [3] shows the evolution of the modularity (Q) [20] as both networks are divided into 
independent clusters (by removal of links with the highest betweennes). We can observe the 
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FIG. 3: Modularity Q of the communities (insets) as the GN algorithm splits the similarity (a) 
and collaboration (b) networks. In the main plots, we have zoomed in on the region indicated in 
the insets, which correspond to the maximum of the Q evolution. Dashed lines indicate sudden 
increments of Q. 
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existence of sudden increments of Q related to different satisfactory network partitions. As 



reported in 



20| the absolute maximum is not always associated with the best partition, and 



therefore, each of these large jumps in Q must be analyzed independently with regard to 
the nature of the data. 

As we saw in the dendogram of the similarity network (Figure [2]), the possible partitions 
are related to the genre classification of the artists belonging to each detected community. 
The maximum value of Q (Q — 0.79) appears when the network is split into 41 communities, 
all of them related to musical genres or styles within those genres. However, the most 
significant partition is observed when the network is divided into 15 communities {Q ~ 0.68) 
since each community can easily be described by a well defined musical genre. Further 
divisions of this network are mainly dominated by the appearance of different styles inside 
each genre. 

In the case of the collaboration network the maximum appears for 81 communities with 
a Q = 0.76. In this case, the interpretation of the existing communities is more complex 
since several factors such as generational overlapping, geographical proximity, genre affinity, 
or, simply, the existence of music bands, induce community formation. 

It is worth mentioning that the obtained values of modularity reveal a strong community 
structure. In all the mentioned cases the percentage of inter-communities links were always 
less than 17%. If we compare with toy- networks used to evaluate community detection 
algorithms [l^, we see that these values of inter-community links correspond to the region 
where the GN algorithm is as good as the others. This conclusion is also supported if we 
look at the inset of Figure 3 in 19|, where the authors show that values of modularity Q 
greater than 0.5 correspond to a region where the GN algorithm performs accurately. All 
this evidence supports the use of the GN algorithm as a suitable community detector in 
these kinds of networks. 

In Fig. m we plot the most significant partitions detected by the community structure 
algorithm, i.e., a division into 15 communities in the case of the similarity network (left 
plot) and 41 communities in the collaboration network (right plot). Since each cluster of the 
similarity network is related to a certain musical genre, we assign different colors to each 
community and we keep them in the collaboration clusters. This way, we can observe how 
musical genres spread among the collaboration clusters and we can compare the relation 
between genres and collaborations. Concerning the collaboration network, two kinds of 
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communities are detected, one with a small number of nodes corresponding to the existence 
of bands and geographic zones, and the other related to certain collaboration communities, 
where jazz artists are the most interactive nodes. It is remarkable that two of the largest 
collaboration communities (3 and 5) are mainly formed by jazz players, a community of 
artists that presents a high degree of collaboration. We identify two kinds of "collaborators" 
in these big communities, one related to artists which usually play in several bands during 
their career (e.g., John Coltrane or Stan Getz) and the other related to jazz artists that 
usually perform as sessionist given that they are experts in one particular instrument (e.g., 
Paulinho Da Costa or Ron Carter). Furthermore, these two largest communities correspond 
to different generations of jazz players, community 3 to the 20's-30's-40's and community 5 
to the 50's-60's. Interestingly, the community of jazz artists who performed together between 
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FIG. 4: Detected communities for the similarity (left) and collaboration (right) networks. Col- 
ors, which correspond to different musical genres (similarity communities), are introduced to help 
comparison between similarity and collaboration communities. 
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.912 and 1940 (which would correspond to community 3 of FigJl]) was previously studied in 
25|. 



III. ROLE CLASSIFICATION 

Once the existing communities have been identified we will try to infer the artists' roles 
inside their communities by mere inspection of the network topology. Recently, Guimera et 
al. 2^ have introduced a classification of the node functionality by analyzing the connectiv- 
ity of nodes within the community structure. Two properties of the node connectivity based 
on the inter/intra community connections are checked. One is the within-module degree Zj, 
which accounts for the connections of the node inside its community, and is defined as: 

= (1) 

where is the degree of node i, is the mean degree inside the community Si of node 
i and cr^^. is the standard deviation of k in Sj. High values of Zi refiect that node i is 
a well connected node inside its community (i.e., a hub), while negative values indicate a 
connectivity below the average (peripheral nodes). 

Another characteristic to be evaluated is how the links of a certain node are distributed 
between the communities. This is measured using the participation coefficient Pi and ac- 
counts for the inter-community link distribution of node i: 

P. = l-Y.{^) (2) 

s=l ^ l^i 

where Nm is the total number of communities, is the number of links of node i that are 
connected to nodes in community s and Hi is the total degree of node i. The participation 
coefficient ranges from zero (all links inside its own community) to close to unity (all links 
equally distributed among all communities). 

In the role classification proposed by Guimera et al. the functionality is obtained by 
analyzing the position of nodes in a two dimensional space given by {Pi, Zi). Nodes with z > 
2.5 are considered hubs and z < 2.5 are non-hubs. The two dimensional space representation 
is divided into seven regions, four of them for non-hub nodes: (Rl) ultra-peripheral nodes, i.e., 
nodes with few connections which belong, in turn, to a unique community, (R2) peripheral 
nodes, which are nodes with few links outside their community, (R3) non-hub connector 
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FIG. 5: Position of nodes in two dimensional space {Pi, Zi) for the similarity network (left) and the 
collaboration network (right). Seven divisions of the two dimensional space used to classify nodes 
roles are shown explicitly. 

nodes, i.e., nodes with several connections to other communities, and (R4) non-hub kinless 
nodes, with their Unks homogeneously distributed among all communities. The other three 
regions divide the types of hubs into: (R5) provincial hubs, i.e., hubs with a large number 
of their links inside their community, (R6) connector hubs, which distribute around 50% of 
their links in several communities and (R7) kinless hubs, whose links are homogeneously 
distributed among all communities. 

In our particular case, we use this classification (after ensuring that it works correctly 
in our network) in order to identify the central nodes of each community, i.e., the most 
influencing artists within a particular musical genre, and also those artists who, due to 
his/her versatility, link two or more musical genres. 

Figure shows the position of nodes in the two dimensional space {Pi,Zi) for both net- 
works. Provincial hubs of the similarity network (R5) are references in their musical genres. 
In this category, we find artists such as Elvis Presley, Elton John, Bruce Springsteen, The 
Rolling Stones, Whitney Houston, Madonna, Joe Satriani, Axl Rose, John Coltrane or Gil 
Evans. On the other hand, there exist artists who are references in their communities but 
they also stood out for having performed in two or more genres. These artists belong to the 
R6 category (connector hubs) and we find names as Stevie Wonder, Eric Clapton, Aretha 
Franklin, Anita Baker, James Ingram, Sting, David Bowie, Frank Sinatra, Vangelis, Blind 
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Blake, Robin Zander or Adrian Belew. 

In Fig. [6] and Fig. [7| we plot a cartographic representation of the seven largest commu- 
nities within the similarity network (Fig. [H]) and the five largest ones in the collaboration 
network (Fig. [7]), where provincial (R5) and connector (R6) hubs have been explicitly indi- 
cated (the rest have been omitted in order to ease the reading. This representation allows 
us to identify not only the artists who are references of each musical genre or collaboration 
clique but also those who act as bridges between communities. 
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FIG. 6: Cartographic representation of the similarity communities. Due to space limits, only 
the seven largest communities have been plot. Provincial Hubs (R5, green) and connector hubs 
(R6,red) have been indicated, in order to show leading artists inside each community and also those 
artists that act as bridges between musical genres. 
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As an example, within the Rock community we can observe how Eric Clapton is a con- 
nector hub that links the Rock genre with the Blues and Jazz communities. Therefore, Eric 
Clapton is an internal connector of the Rock community. Other kind of connector hub is 
Blind Blake, who belongs to the Blues cluster. This artist is an external connector of the 
Rock community, since it is one of the bridges between the Blues and Rock communities. 
This type of representation provides an objective mechanism for classifying the function of 
leader artists inside their musical communities by using topological properties of the network 
and furthermore to quantify connections between different musical genres. 
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IV. CONCLUSIONS 



We have shown that the identification of community structures within music networks 
is a useful tool in order to evaluate the existence of musical cliques and to identify the 
role of leading artists inside each community. In the case of music similarity networks we 
have observed that the detected communities are mainly related to musical genres, while 
the collaboration network presents communities related to artists generations, geographical 
constraints, genre affinity or music bands. In the collaboration network, for example, jazz 
players are the most active artists and give rise to the appearance of large communities 
related to different generations. Finally, we have studied a method to identify the leading 
artists of each community and the internal/external connector hubs, who act as bridges 
between different musical genres. The information obtained from the community analysis 
could be a useful tool not only to evaluate the role or relevance of a given artist but to 



improve the performance of music recommendation systems 
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